AITopics | comprehensive dataset and benchmark

Collaborating Authors

comprehensive dataset and benchmark

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Supplementary Material for " AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery "

Neural Information Processing SystemsOct-10-2025, 04:11:20 GMT

In Sec. 2 we include a We include a datasheet for our dataset following the methodology from "Datasheets for Datasets" Ge-17 In this section, we include the prompts from Gebru et al. [2021] in blue, and in For what purpose was the dataset created? Was there a specific task in mind? The dataset was created to facilitate research development on cloud removal in satellite imagery. Specifically, our task is more temporally aligned than previous benchmarks. Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., Who funded the creation of the dataset?

dataset, information, please provide, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Industry:

Law (1.00)
Government (0.68)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

TEG-DB: A Comprehensive Dataset and Benchmark of Textual-Edge Graphs

Neural Information Processing SystemsMay-27-2025, 04:54:35 GMT

Text-Attributed Graphs (TAGs) augment graph structures with natural language descriptions, facilitating detailed depictions of data and their interconnections across various real-world settings. However, existing TAG datasets predominantly feature textual information only at the nodes, with edges typically represented by mere binary or categorical attributes. This lack of rich textual edge annotations significantly limits the exploration of contextual relationships between entities, hindering deeper insights into graph-structured data. To address this gap, we introduce Textual-Edge Graphs Datasets and Benchmark (TEG-DB), a comprehensive and diverse collection of benchmark textual-edge datasets featuring rich textual descriptions on nodes and edges. The TEG-DB datasets are large-scale and encompass a wide range of domains, from citation networks to social networks.

artificial intelligence, comprehensive dataset and benchmark, natural language, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.79)

Add feedback

AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery

Neural Information Processing SystemsMay-27-2025, 03:02:21 GMT

Clouds in satellite imagery pose a significant challenge for downstream applications.A major challenge in current cloud removal research is the absence of a comprehensive benchmark and a sufficiently large and diverse training dataset.To address this problem, we introduce the largest public dataset -- *AllClear* for cloud removal, featuring 23,742 globally distributed regions of interest (ROIs) with diverse land-use patterns, comprising 4 million images in total. Each ROI includes complete temporal captures from the year 2022, with (1) multi-spectral optical imagery from Sentinel-2 and Landsat 8/9, (2) synthetic aperture radar (SAR) imagery from Sentinel-1, and (3) auxiliary remote sensing products such as cloud masks and land cover maps.We validate the effectiveness of our dataset by benchmarking performance, demonstrating the scaling law - the PSNR rises from 28.47 to 33.87 with 30\times more data, and conducting ablation studies on the temporal length and the importance of individual modalities. This dataset aims to provide comprehensive coverage of the Earth's surface and promote better cloud removal results.

artificial intelligence, comprehensive dataset and benchmark, machine learning, (3 more...)

Neural Information Processing Systems

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.99)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

Neural Information Processing SystemsMay-26-2025, 16:26:22 GMT

Multi-Modal Large Language Models (MLLMs) have demonstrated impressive performance in various VQA tasks. However, they often lack interpretability and struggle with complex visual inputs, especially when the resolution of the input image is high or when the interested region that could provide key information for answering the question is small. To address these challenges, we collect and introduce the large-scale Visual CoT dataset comprising 438k question-answer pairs, annotated with intermediate bounding boxes highlighting key regions essential for answering the questions. Additionally, about 98k pairs of them are annotated with detailed reasoning steps. Importantly, we propose a multi-turn processing pipeline that dynamically focuses on visual inputs and provides interpretable thoughts.

artificial intelligence, comprehensive dataset and benchmark, natural language, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

BatteryLife: A Comprehensive Dataset and Benchmark for Battery Life Prediction

Tan, Ruifeng, Hong, Weixiang, Tang, Jiayue, Lu, Xibin, Ma, Ruijun, Zheng, Xiang, Li, Jia, Huang, Jiaqiang, Zhang, Tong-Yi

arXiv.org Artificial IntelligenceFeb-26-2025

Battery Life Prediction (BLP), which relies on time series data produced by battery degradation tests, is crucial for battery utilization, optimization, and production. Despite impressive advancements, this research area faces three key challenges. Firstly, the limited size of existing datasets impedes insights into modern battery life data. Secondly, most datasets are restricted to small-capacity lithium-ion batteries tested under a narrow range of diversity in labs, raising concerns about the generalizability of findings. Thirdly, inconsistent and limited benchmarks across studies obscure the effectiveness of baselines and leave it unclear if models popular in other time series fields are effective for BLP. To address these challenges, we propose BatteryLife, a comprehensive dataset and benchmark for BLP. BatteryLife integrates 16 datasets, offering a 2.4 times sample size compared to the previous largest dataset, and provides the most diverse battery life resource with batteries from 8 formats, 80 chemical systems, 12 operating temperatures, and 646 charge/discharge protocols, including both laboratory and industrial tests. Notably, BatteryLife is the first to release battery life datasets of zinc-ion batteries, sodium-ion batteries, and industry-tested large-capacity lithium-ion batteries. With the comprehensive dataset, we revisit the effectiveness of baselines popular in this and other time series fields. Furthermore, we propose CyclePatch, a plug-in technique that can be employed in a series of neural networks. Extensive benchmarking of 18 methods reveals that models popular in other time series fields can be unsuitable for BLP, and CyclePatch consistently improves model performance establishing state-of-the-art benchmarks. Moreover, BatteryLife evaluates model performance across aging conditions and domains. BatteryLife is available at https://github.com/Ruifeng-Tan/BatteryLife.

battery, batterylife, dataset, (15 more...)

arXiv.org Artificial Intelligence

2502.18807

Country:

Europe > Austria > Vienna (0.14)
North America > Canada > Ontario > Toronto (0.05)
Africa > Rwanda > Kigali > Kigali (0.04)
(8 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Energy > Energy Storage (1.00)
Electrical Industrial Apparatus (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback